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tion. In this paper, we focus on the tourism application scenario and its specific 
requirements. We discuss a novel RS approach that copes with the specific appli- 
cation constraints of the domain and produces recommendations that better 
match the true needs of tourists. We illustrate the proposed next POI recom- 
mendation approach in a case study and we compare it with a state-of-the-art 
nearest neighbor-based next item RS. With the analysis of this case study, we 
aim at illustrating the specific features of the compared approaches also with the 
goal to raise the discussion on RSs validation methods, with a particular atten- 


tion to tourism applications. We finally discuss some significant limitations of 


INTRODUCTION 


Recommender systems (RSs) are personalized information 
search and discovery applications helping users to identify 
and choose useful items and information (Jannach et al. 
2016; Ricci, Rokach, and Shapira 2015). RSs are nowadays 
very popular in streaming platforms (e.g., Netflix and 
Spotify), and eCommerce websites (Amazon). In this 
paper, we focus on the tourism application scenario and 
its specific requirements (Staab et al. 2002; Werthner 
et al. 2015; Werthner and Ricci 2004; Rabanser and Ricci 
2005). In particular, we concentrate on a typical tourist’s 
information search task: finding novel and compelling 
points of interest (POIs) to visit, and eventually extend an 
already initiated or planned visit itinerary to a destination, 
for example, a city (Braunhofer, Elahi, and Ricci 2015). 
Tourists often face this sequential decision-making 
problem, while planning their visit to a destination or 
when at the destination continuing an already initiated 
trajectory of visited POIs (Staab et al. 2002). We note that 
in tourism, quite differently from the above mentioned 
applications (movies, music, ecommerce), there is no 


current evaluation approaches that must be addressed in future studies. 


clearly defined catalog of recommendable items. In 
fact, what is recognized as a point of interest for some 
tourists may not be seen as a tourism target for others. For 
instance, while an Italian tourist may be recommended to 
visit a small town in a nearby region of her residency, this 
will not be recognized as a compelling target for a Japanese 
tourist, who will instead consider the whole Italy as a pos- 
sible destination, maybe in alternative to France (Hwang, 
Gretzel, and Fesenmaier 2002). So, it might be critical for 
an RS to help any type of tourist, at the decision/choice 
point, to find POIs that can be recognized as interesting 
targets, based on the tourist’s culture, knowledge, and 
personality (Gretzel et al. 2004). Moreover, POIs are worth 
to be visited because they generate experiences, and the 
quality of these experiences is hard to be fully estimated 
beforehand, at planning time. Hence, first, the RS should 
be able to “persuade” the tourist of the goodness of its 
recommendations, since, as we said, we cannot expect 
that such a quality can be fully assessed on the base of the 
provided information, especially if the POI is not already 
known by the tourist (Gretzel and Fesenmaier 2006). 
Second, the recommended POIs, when they are actually 
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visited, must satisfy the tourist, give a “reward,” and create 
a memorable experience (Gretzel et al. 2015). 

In this article, we discuss the difficulties that these 
two goals create to the design of an RS in the tourism 
domain. We note that RS research has already tackled the 
problem of next item recommendation (Hariri, Mobasher, 
and Burke 2012; Hashemi and Kamps 2017; Jannach 
and Lerche 2017; Ludewig and Jannach 2018; Quadrana, 
Cremonesi, and Jannach 2018; Shani, Heckerman, and 
Brafman 2005; Zhang, Chow, and Li 2014; Moling, 
Baltrunas, and Ricci 2012), but the state-of-the-art solu- 
tions, while being generally applicable to a wide range of 
application domains, have failed to address the specific 
needs of tourists. In fact, major players of the online 
tourism market, such as Booking.com or Tripadvisor.com, 
have not yet adopted these sophisticated solutions and 
nowadays they offer a recommendation functionality 
that is not personalized: it is either based on the average 
opinion of the users or on the items’ popularity. However, 
for business motivations, they do consider, when gen- 
erating recommendations for tourists, constraints, and 
goals imposed by the suppliers side (Abdollahpouri et al. 
2020). Hence, ultimately, this RS application domain has 
not grown with the same fast pace that other domains 
have seen. 

One of the causes of the slow development of RS appli- 
cations in tourism is surely related to the difficulty to 
acquire information about the true user behavior, that is, 
the sequence of experiences that travelers perform. So, 
while their online information search activity is easy to be 
tracked (Choe, Fesenmaier, and Vogt 2017), their true expe- 
riences, that is, the POIs they visit, are only known indi- 
rectly, in the form of selected reviews, which only specific 
travelers (bloggers) usually provide (Marchiori, Cantoni, 
and Fesenmaier 2013; Zhang and Fesenmaier 2018). This 
is substantially different from other domains; in Netflix, 
for instance, the users’ watching behavior is easily tracked 
and users can express their “like” for a movie by just one 
click (Auksorncherdchoo and Sukstrienwong 2018; Krish- 
namurthy and Wills 2009; Castelluccia, De Cristofaro, and 
Perito 2010). So, RSs in the tourism domain suffer from a 
continuous state of “coldness”: they do not have enough 
users’ preference data to generate effective and personal- 
ized recommendations (Elahi et al. 2018). From a more 
technical point of view, we argue that the unsatisfactory 
results of current tourism RSs reside also on the usage of 
standard recommendation models, which are optimized to 
precisely predict the observed tourist behavior, and there- 
fore, they offer suggestions that match, as precisely as 
possible, what the single tourist is observed to do. But, 
tourists are rarely experts, especially when visiting new 
destinations, and their behavior is typically exploratory. 
So, their, even scarce, observed behavior cannot be directly 


used as model training data or ground truth for measur- 
ing the goodness of the recommendation model. In fact, an 
important goal of an RS is to support “knowledge discov- 
ery,” and this is particularly true in the tourism domain: 
recommendations should indicate novel items that the 
user is not aware of, but will like (Werthner et al. 2015). 

In order to address these requirements and issues, we 
discuss in this paper a novel RS approach that copes with 
the specific application constraints of the domain and 
produces recommendations that better match the true 
needs of the tourists (Massimo and Ricci 2018a; 2021a). 
This recommendation approach is implemented in three 
steps. First, clusters of tourists with a similar observed 
behavior are created. We note that tourists are normally 
classified in standard prototypical types (Yiannakis and 
Gibson 1992). In our approach, a cluster corresponds to a 
type of tourists, but these clusters are not apriori defined, 
as in the cited tourism literature. Conversely, clusters are 
computed by running a clustering algorithm directly on 
the (scarce) observed behavior data, which consists of 
the trajectories of successive POI visits in a city that are 
performed by a collection of observed tourists. Moreover, 
the obtained clusters of tourists depend on a specific rep- 
resentation of the visit trajectories, which we define, and 
it comprises features related to the content of the visited 
POIs (e.g., the historical period of the POI), and the visit 
context (e.g., the part of the day when the POI was visited). 

In a second step, for each identified cluster of tourists, a 
behavior model of the sequential decision-making process 
of the tourist is built. The behavioral model determines 
which POIs a tourist in a cluster will likely choose next, 
that is, after having chosen other POIs, and how much 
“reward” the tourist is estimated to obtain by a POI visit. 
The behavioral model is learnt via Inverse Reinforcement 
Learning (IRL) (Abbeel and Ng 2004; Babes-Vroman et al. 
2011) and it is only based on the observed behavior, that 
is, tourists are not supposed to give any explicit feedback 
on their past POI visit experiences. However, the learning 
procedure implicitly assumes that tourists aim at maximiz- 
ing an unknown reward function that is actually estimated 
by the learning algorithm. We note that, by building a 
behavioral model for each cluster, the model, while not 
being individually specific, as it is common in RSs, it is not 
even completely general (one single model for all) as in the 
above-mentioned industry solutions. We note as well that 
the main rationale of clustering tourists and building a 
behavioral model for each cluster is the above-mentioned 
“coldness” of the available data: rarely there is enough, 
previously observed, individual behavior data that suffice 
to build a fully personalized and individualized model. 

In the third step of the proposed recommendation 
approach, the learnt behavioral models, one for each 
cluster of tourists, are leveraged for building next POI 
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recommendations that have the characteristics of the 
POIs typically visited by the tourists in the same cluster 
of the target tourist. We stress that, differently from more 
traditional approaches used in session-based RSs, which 
tend to recommend the items more likely to be consumed 
by the target user, the proposed approach tries to identify 
the items (POIs) that will be perceived as having, and will 
actually give, a larger “reward” to the tourist. The reward 
is a system proxy for the satisfaction of the experience 
of the POI. This is achieved by implementing alternative 
heuristics, aimed at balancing these two, possibly con- 
flicting goals: identify POIs that the tourist can recognize 
as relevant, before experiencing them, but also that will 
produce satisfying experiences when actually visited. 
These alternative heuristics are called “recommendation 
strategies” and prioritize specific characteristics of the 
generated recommendations, hence they are not limited 
to maximize recommendation accuracy, as in more tra- 
ditional approaches. A key ingredient of the proposed 
recommendation strategies is instead the maximization 
of the estimated reward that a tourist can obtain from the 
recommended experiences (POIs), that is, we try to prior- 
itize the quality of the experience of a recommended POI, 
rather than the accuracy to match the observed behavior. 
However, the probability that the tourist will recognize, 
before the visit, that the recommended POI matches her 
preferences, is an important element to consider, and we, 
therefore, offer also an hybrid solution aimed at attaining 
this goal as well. 

In this article, we illustrate the proposed next POI 
recommendation approach in a case study and we com- 
pare it with a state-of-the-art nearest neighbor-based 
next item RS. With the analysis of this case study, we 
aim at illustrating the specific features of the compared 
approaches also with the goal to raise the discussion on 
RSs validation methods, with a particular attention to 
tourism applications. In particular, we illustrate to what 
extent results obtained in an offline evaluation study 
are confirmed in a user study. But, we also discuss some 
significant limitations of both evaluation approaches that 
must be addressed in future studies. 

This paper is organized as follows. Section “Recom- 
mender Systems for Tourism” presents an overview 
of Tourism RSs developed in industry and academia, 
summarizing open challenges. Section “Next POI Rec- 
ommendation” introduces our next-POI recommendation 
approach and Section “Evaluating Tourist RSs” discusses 
important issues arising in the evaluation of tourism RSs. 
Section “Offline and Online Next POI Recommendations” 
illustrates the experimental results we collected by means 
of offline and online evaluation studies. Finally, we discuss 
challenges and future research directions in Section “Open 
Challenges for Tourism Recommender Systems”. 


RECOMMENDER SYSTEMS FOR 
TOURISM 


Even though tourism applications of RSs have attracted 
less attention, compared to, for instance, mainstream 
music and movie applications, the next-POI recommenda- 
tion problem received some specific recognition (Adam- 
czak et al. 2020). In this application problem, clearly, the 
sequential nature of the items consumption plays a rel- 
evant role (Dellaert, Ettema, and Lindh 1998). Moreover, 
next-POI recommendation solutions have tried to address 
an important challenge of the domain, which is the lack of 
individual data about tourists’ POI visits. In fact, tourists 
do have privacy concerns (Poikela et al. 2015; Perentis, 
Vescovi, and Lepri 2015) and many tourists are reluctant 
to share their location with companies. As a consequence 
of that, for each single tourist, the set of opinions about 
the visited POIs., for example, booked hotels or attrac- 
tions, could be very small and even empty (Bin et al. 2019). 
To partially circumvent this problem, many studies deal- 
ing with next-POI recommendation use data derived from 
social networks (Baraglia et al. 2013; Oppokhonov et al. 
2017; Palumbo, Rizzo, and Baralis 2017; Sanchez and Bel- 
login 2020). It is worth noting that social network users 
do not represent the full spectrum of tourists, and the core 
problem of acquiring unbiased and representative behav- 
ioral data remains. However, for this population of social 
networks users, by leveraging check-in data or geo-tagged 
media content uploaded by users on web platforms, it is 
possible to reconstruct their (partial) POI visit activities, for 
example, during a visit to a city (Silva et al. 2019). Hence, 
nowadays industrial players with their social network plat- 
forms, like Google!, Foursquare”, and Facebook’ are in a 
much better position for implementing next POI recom- 
mendation solutions, even compared with players of the 
tourism market. 

In general, we must observe that many state of the 
art solutions, tackle the next-POI recommendation prob- 
lem without appropriately considering the typology of the 
POI, in any tourism-related classification of the POIs, and 
without considering the context of the visit, for exam- 
ple, with whom the tourist visited the POI (Oppokhonov 
et al. 2017; Huang and Gartner 2014; Wang et al. 2018). 
Hence, state-of-the-art solutions do not try to “under- 
stand” what conditions and features make a POI worth 
to be visited by a specific tourist. These solutions reuse 
trajectory data mining approaches (Zheng 2015) where it 
is assumed that only spatio-temporal aspects define the 
similarity of POI-visit trajectories performed by tourists. 
Understanding the motivations that steer tourists to make 
specific choices is left apart. A common pitfall of these 
solutions can be found, for instance, in Torrijos, Bel- 
login, and Sanchez (2020) where, in order to identify 
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tourists interested to a target POI, important informa- 
tion related to the POI visits, for example, the weather 
conditions at visit time and the type of visited POI, is 
neglected, while more easily measurable properties, bor- 
rowed from trajectory mining techniques, such as, the 
distance of the points coordinates of the shape described 
by the POI visit trajectories, are considered. In gen- 
eral, state-of-the-art solutions, such as nearest neighbor- 
based RSs, leverage the similarity of POI-visit trajectories, 
and generate next-POI recommendations by mining fre- 
quent patterns in similar trajectories (Hariri, Mobasher, 
and Burke 2012; Jannach and Lerche 2017; Sanchez and 
Bellogin 2020). 

Another line of research of the state -of-the-art relates 
to identifying distinguished typologies of tourists by clus- 
tering them on the base of features derived from their 
traits or behavior (Palumbo, Rizzo, and Baralis 2017; Yao 
et al. 2017). In Palumbo, Rizzo, and Baralis (2017), clus- 
ters of tourists are identified by leveraging demographic 
information acquired from social media platforms. The 
reconstructed POI-visit trajectories are enriched with fea- 
tures describing the category of each POI. The authors 
try to identify POI categories relevant for a target tourist 
but the final step of producing recommendations is not 
addressed. In another solution based on check-in data (Yao 
et al. 2017), the authors propose to use a deep neural net- 
work to extract behavior features that capture space- and 
time-invariant characteristics of trajectories collected from 
social networks. 

A more sophisticated clustering approach for next-POI 
recommendation is presented in McKenzie and Janowicz 
(2014). Given the user’s preferences over places derived 
from a location-based social network, the model finds sim- 
ilar individuals based on properties of the preferred items 
and recommends places based on related preferences of 
these similar individuals. Clustering is applied to users’ 
check-in data to identify individual’s daily activities. For 
each cluster, a POI that best represents each cluster is 
identified as “typical activity.” By considering week-day 
and weekend activities a user is characterized with spe- 
cific activities to be performed on those specific days. 
Recommendations are then generated by user-to-user 
collaborative filtering. 

Interestingly, and somewhat related to the topic of clus- 
tering tourists, GroupTourRec (Lim et al. 2016) is a system 
that includes the functionality to form groups of homo- 
geneous people, by identifying POIs appropriate to each 
group and assigning a guide to each group. Hence, here 
clustering is used for forming groups of users to travel 
together; users are independent travelers and are clus- 
tered together according to their behavior. The suggestions 
of POIs to visit are generated by solving an orienteering 
problem rather than using predictive techniques. 


The sequential nature of the item consumption in 
tourism plays a relevant role in the itinerary recommenda- 
tion solution proposed in Herzog, Laß, and Worndl (2018) 
and Rani, Kholidah, and Huda (2018). Here the supported 
task is to advise the tourist while planning the visit activ- 
ity. In Rani, Kholidah, and Huda (2018), the authors aim at 
finding optimal itinerary recommendation in terms of dis- 
tance and travel time. They start from the assumption that 
the user has already identified the POIs she wants to visit 
and the number of days she will spend in the region. In 
this situation, a clustering algorithm distributes the POIs 
in clusters that correspond to the available days. Then 
a traveling salesman problem algorithm determines the 
actual visit order. We observe that this solution assumes 
that tourists are already knowledgeable about a place and 
they already know what they want to visit. 

In Worndl, Hefele, and Herzog (2017), the authors 
present a travel RS that recommends a list of POIs that 
the tourist does not necessarily know in advance. Given 
a start and an endpoint, an itinerary is built by using a 
custom shortest path algorithm that optimizes user prefer- 
ences over POI categories and time constraints objectives. 
The estimated suitability of the POIs for the itinerary is 
based on the tourist stated preferences and the POIs rep- 
utation, which is derived from the ratings and the number 
of votes, collected from a social network. 

As we already mentioned at the beginning of the sec- 
tion, the surveyed approaches tend to ignore an important 
dimension of the tourist POI experience: the context of 
the visit. Contextual factors, such as, visiting a POI on a 
“sunny day” with the “family” during the “spring holi- 
days,” influence not only the tourists’ choices but also their 
memories (Lamsfus et al. 2014; Matzarakis 2006). In Hong 
et al. (2019), the authors investigate how the cultural 
dimension influences the acceptance of the recommen- 
dation. In fact, tourist’s culture is intertwined with the 
visit context (Savard and Mizoguchi 2019) and they jointly 
affect the users’ preferences and experiences. The authors 
propose to use clustering and dimensionality reduction 
techniques to identify cross-cultural factors that are lever- 
aged in the prediction of POI recommendations. More in 
general, previous literature on context modeling in tourism 
has dealt with the temporal context (Sanchez and Bellogin 
2020; Zhao et al. 2019) but only few authors considered also 
the categories of the POIs when dealing with contextual 
effects (Li et al. 2019; 2020). 

We point out that most of the itinerary recommenda- 
tion approaches that have been proposed in the past tend 
to reinforce the consumption of POIs that are popular 
and often already known by the users. Moreover, no past 
approach have tried to model and leverage the “reward” 
that tourists obtain by visiting the recommended POIs. 
While precisely defining such a reward is difficult, our 
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approach tries to capture such as hidden reward by mak- 
ing the assumption that overall tourists make and report 
visits that are rewarding for them, and only erroneously 
they visit POIs that have not this property. Hence, by lever- 
aging a specific learning approach, namely IRL, aimed at 
learning such an hidden reward that motivates the deci- 
sion maker, we try to generate recommendations that have 
the characteristics of the items preferred by the tourist. 

Moreover, very little attention has been given to the 
proper, user-based, evaluation of next-POI RSs, which is 
clearly due to the planning and management costs inher- 
ent to these evaluation methods (Gunawardana and Shani 
2015). User studies in the travel domain can be found 
in Braunhofer, Elahi, and Ricci (2014), Nguyen and Ricci 
(2018), Herzog and Worndl (2019), but the focus of these 
works was not on next-POI recommendations. 


NEXT POI RECOMMENDATION 


We focus on a scenario where the RS is used to assist 
tourists in sequential decision-making, that is, in facing 
the next-POI recommendation problem: looking for an 
additional POI to visit after having visited some other 
POIs (Massimo and Ricci 2018a; 2021a). We present here 
the three-step approach that we have sketched in the intro- 
duction. In the rest of this section, we assume that there 
is a data set of observed visit trajectories of a collection of 
tourists that is used to learn the behavior model. Each visit 
trajectory is composed by a sequence of POI visits. Each 
POI visit is described by a visited POI and a set of contex- 
tual conditions observed at the visit time, for example, the 
weather conditions. 


Clustering tourists’ visit trajectories 


At first, we cluster tourists’ visit trajectories, into groups of 
trajectories related to a common topic. These clusters are 
extracted directly from the analysis of tourists’ behavior, 
after having identified a set of features that can be used to 
describe the content of a POI and the context of the visit to 
the POI. 

In order to identify such clusters, we represent each 
observed tourist’s POI-visit trajectory in a document-like 
format, where the terms of a trajectory document are the 
content and context features describing the POI visits con- 
tained in the trajectory. Hence, this representation of the 
visit trajectory captures different dimensions that charac- 
terize the traveler experience: the context of the POI visits, 
for example, the part of the day and weather when the visit 
occurred; and what is visited, for example, POI category 
and historic period (Massimo and Ricci 2020). By doing so, 


we abstract from the visit order and the identity of the spe- 
cific visited POIs, and we focus on what may interest the 
tourist (content features) and in which conditions (context 
features). It is important to note that in order to succeed 
in the identification of clusters that can really correspond 
to meaningful tourist typologies, it is fundamental to lever- 
age the “right” set of descriptive features to represent the 
visit to a POI. This is an activity that we have performed by 
leveraging specific domain knowledge (see Massimo and 
Ricci (2020) for more details). 

Then, to form the required clusters of POI-visit trajec- 
tories, we used a topic model approach based on non- 
negative matrix factorization (Massimo and Ricci 2018a). 
This method allows us to identify a small number of hid- 
den topics in the document-trajectory collection. A topic 
is described by a collection of terms: those more related to 
the topic. For instance, by using the data set of visit tra- 
jectories described later in the paper we have identified 
five topics, and one of these (hidden) topics is associated 
to trajectories that are characterized by the terms: morn- 
ing, cold, square, palace, 15th century (the full description 
of these topics can be found in Massimo and Ricci (2020)). 
Hence, in the observed set of visits, a group of tourists 
seems to be interested in visiting palaces and squares of the 
15th century in cold mornings. The clusters of visit trajecto- 
ries are then defined by grouping together the trajectories 
more strongly associated to the identified topics, that is, 
one topic defines one cluster, and a trajectory can belong 
to more than one cluster. 

The main benefit of this approach resides in the fact 
that we can identify groups of related visit trajectories, 
even when dealing with small sized datasets of observed 
tourists’ choices. Besides, even if we had at our disposal 
many POI-visit trajectories for each tourist, they will still 
reveal a restricted set of preferences, which are biased by 
the tourist limited knowledge of the destination. So, these 
trajectories may also contain suboptimal choices. Cluster- 
ing is a first step to overcome these problems: suboptimal 
choices made by one tourist may be compensated by better 
choices of other tourists in the same cluster (by assum- 
ing that not all tourists make the same errors). Learning 
a behavioral model for each cluster is the second step to 
extract from the observed visit trajectories a useful model 
of the true preferences of the tourist. 


Tourists behavior learning 


We want to learn the user behavior models that character- 
ize the tourists’ typologies captured by the generated clus- 
ters. This means that we want to estimate the unknown 
reward that tourists in a cluster seem to optimize in their 
behavior, that is, by performing the observed POI visits 
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in that order. The proposed approach does not assume 
that tourists are completely aware of what makes a POI 
visit rewarding, but tries to extract the rationale just by 
observing the characteristics and the context of the vis- 
ited POIs. For instance, the proposed approach seeks to 
estimate the reward that a tourist who visits, for exam- 
ple, the Colosseum in Rome, obtains by visiting as next 
POI, Fontana di Trevi and thereafter Villa Borghese. More- 
over, the proposed approach tries to determine which next 
POIs, after Villa Borghese the tourist should, step by step, 
choose. 

We use a standard Markov Decision Problem (MDP) 
model to frame the tourist’s POI-visit decision making 
task (Abbeel and Ng 2004). A MDP is a tuple (S, A, T,r, y). 
S is a finite set of states, and, in our case, a state represents 
a visit to a POI under specific contextual conditions, for 
example, visiting the Colosseum during a sunny day. A is 
a finite set of actions: moving to one of the available POIs. 
T is a finite set of probabilities: T(s’|s, a) is the observed 
probability to make a transition from state s to s’ when 
action a is performed. These probabilities account for the 
possibility that when the tourist decides to make the action 
to visit a next POI, for example, Fontana di Trevi, contex- 
tual conditions, such as the weather, may change in an 
unexpected way, hence the reached state is not univocally 
determined by the performed visit action. The function 
r : S > R models the reward the decision maker obtains 
from acting in a certain way, that is, by being in a state, that 
is, by visiting a specific POI in a particular context. This 
function is unknown in our application scenario, because 
we do not assume that the tourist gives an explicit feed- 
back (e.g., a rating or a like), and therefore, the reward 
function must be learnt by using only the observed POI vis- 
its. Finally, y € [0, 1] is a parameter measuring how much 
rewards from visits performed later in a visit trajectory are 
discounted with respect to the immediate ones: a reward 
received k visits after the current visit is worth only y<-! 
times what is would be worth if it were received imme- 
diately. The lower the value of y the more myopic is the 
decision maker, that is, he is just trying to optimize the 
immediate reward and less the reward that can be obtained 
by the subsequent visits. 

Given the MDP associated to a cluster, which models 
the common decision problem faced by the tourists whose 
visit trajectories are contained in the cluster, the behavioral 
model for this cluster is a decision policy z* : S > A that 
maximizes the cumulative reward that the decision maker 
obtains by acting according to z* (optimal policy). The 
value of taking a specific action a in state s under a policy 
z, is indicated with Q,(s, a), and it is the (expected) dis- 
counted cumulative reward obtained by making the next 
POI visit a in state s and then continuing to make suc- 
cessive visits by following the policy zr. The optimal policy 


z* dictates to the decision maker in state s to perform the 
action that maximizes the value function Qy. 

Since, as we said, the reward function is unknown, the 
optimal policy z*, that is, the optimal behavior of the 
decision maker, cannot be determined with standard rein- 
forcement learning algorithms (Sutton and Barto 1998). 
Conversely, in this case, IRL can be used (Abbeel and Ng 
2004; Ermon et al. 2015). IRL enables to identify both the 
reward function, which the decision maker seems to opti- 
mize, and the optimal policy for that reward function. In 
other words, by using IRL one can estimate how tourists 
in aclusters behave, what reward seem to obtain from vis- 
its to different POIs, and, in any possible state, the next best 
visit that they should make. 

The reward function r and the associated optimal action 
selection policy z* that are computed by IRL strictly 
depend on the observed (clustered) POI-visit trajectories 
but also on the selected state feature function ¢ : S > R” 
that assigns to each state a vector of feature values (n is 
the number of features). We also observe that when IRL 
is used, an apriori defined constraint on the form of the 
reward function must be imposed, so that the problem can 
actually be solved. Hence, as in Abbeel and Ng (2004), 
we assume that r is a linear function, r(s) = 6’ ¢(s), of 
the state s feature vector ¢(s). The vector of parameters 0 
model the unknown decision maker’s preference for the 
state features. Hence, we make a simplifying assumption 
on the structure of the tourists preferences: the reward 
grows when the visit to the POI is described by the features 
(content and context) that the user prefers. 

Morever, by using IRL we implicitly assume that a 
tourist is a rational decision-maker, seeking to optimize a 
(unknown) reward determined by the visited POIs. Such 
an agent is typically referred to as an “expert,” because 
the observed behavior is assumed to be dictated by knowl- 
edge. However, it is difficult to believe that tourists are true 
“experts,” that is, the observed behavior surely contains 
suboptimal choices: for instance, tourists may repeatedly 
visit a few popular POIs. Learning the user behavior from 
a cluster of POI-visit trajectories of tourists is actually 
aimed to tame the problems related the presence of sub- 
optimal choices: suboptimal choices, if not correlated, will 
not jeopardize the learned behavior model. 


Recommendation strategies 


Having learned a behavioral model for each cluster of 
tourists, we propose to use it to suggest next-POI visits to 
the tourists in that cluster (Massimo and Ricci 2018a; 2020; 
2021a). We recall here the important assumptions on a suit- 
able next-POI RS that we discussed in the introduction. We 
do not want to generate recommendations equal to those 


AI MAGAZINE 


Mn 


actually consumed by the target tourist; the recommended 
POIs must be perceived as valuable and must offer reward- 
ing experiences to the tourist. In order to accomplish this 
goal, we consider alternative heuristics, aimed at balancing 
these two, possibly conflicting goals. These heuristics are 
called “recommendation strategies” and prioritize specific 
characteristics of the generated recommendations, hence, 
should not be limited to maximize recommendation pre- 
cision, as in traditional approaches. Several strategies may 
be implemented and we hope to see further developments 
in this direction. In this paper, we exemplify this analysis 
by considering two of them. 

The first one is called Q-BASE and it directly exploits 
the learnt user behavior model of the cluster the tourist 
belongs to. Q-BASE recommends as next POI action visit, 
the optimal one, according to the optimal decision policy 
learned in the tourist’s cluster. The optimal visit action has 
the largest Q value in the current user state. Hence, if the 
tourist will make this choice and will continue to make suc- 
cessive POI visits by choosing the actions with the largest 
Q value, which are recommended by Q-BASE, then the 
obtained cumulative reward will be maximized. Q-BASE is 
therefore a recommendation strategy that not only tries to 
suggest the most satisfying immediate next POI visit, but 
also the visits that the tourist will be able to make after that 
immediate next. Moreover, since the reward is estimated 
on the base of the POI characteristics and visit contextual 
conditions, Q-BASE can even recommend novel POIs, not 
yet visited by tourists, provided that they have the char- 
acteristics of the POIs visited by the tourists in the same 
cluster, and are visited in the contextual condition typically 
preferred by the tourist in the same cluster. 

The second strategy acknowledges that tourists often 
follow trends, being influenced by POIs popularity and 
fashion (Garcia 2004), which are easily communicated by 
websites like TripAdvisor. While these aspects may not 
influence the experience that the tourist will have by vis- 
iting a POI, even though, visiting popular POIs may be 
considered a target for some tourists, they will certainly 
influence the decisions of the tourist. It is well known that 
“familiarity breeds liking.” For instance, in experiments 
made with music, it has been found that people do not 
select what they think they like but what are more familiar 
with Madison and Schidlde (2017). Tourists are not dif- 
ferent, and they often visit what are considered to be the 
top attractions and frequently mentioned POIs (Moutinho 
1987; Swarbrooke and Horner 2006). Hence, in the second 
recommendation strategy, which is called Q-POP PUSH, we 
take that aspect into account and we generate recommen- 
dations by averaging two criteria: the first is the cumulative 
reward that can be obtained by making the next-POI visit, 
as for Q-BASE, and the second is the popularity of the POIs, 
which is estimated on the available visit trajectories. 


EVALUATING TOURIST RSs 


The effectiveness of RSs has been assessed via offline anal- 
ysis, user studies and online testing (Gunawardana and 
Shani 2015). An offline analysis offers a quick and inex- 
pensive tool for evaluating the RS performance by using 
existing datasets of user-item interactions, and computing 
predefined metrics, which are mostly estimating the preci- 
sion of the RS (Karypis 2001; Cremonesi, Koren, and Turrin 
2010). Precision relates to the ability of a recommendation 
approach to predict either the observed user choices (e.g., 
the visited POIs) or the recorded evaluations for items (e.g., 
ratings for POIs). The prediction of user choices is typically 
assessed by computing information retrieval metrics: pre- 
cision and recall. Precision is computed as the fraction of 
the relevant items among the recommendations, whereas 
recall is the fraction of the relevant items that are recom- 
mended. The precision of the predicted item evaluations 
is instead measured by regression type error metrics, such 
as mean absolute error (MAE) or root mean square error 
(RMSE) (Gunawardana and Shani 2015; Herlocker et al. 
2004; Powers 2008). 

Researchers have pointed out that optimizing an RS for 
precision can even negatively affect the overall user expe- 
rience (McNee, Riedl, and Konstan 2006). In fact, striving 
for precision can lead to the recommendations of items 
that are often uninteresting, as they too closely match what 
the user already typically consumes and knows. Moreover, 
these recommendations self-reinforce the consumption of 
blockbuster items (Zhou et al. 2010; Ball 2010; McNee, 
Riedl, and Konstan 2006; Vargas and Castells 2011). Hence, 
it is argued that a proper assessment of an RS should 
be based on a broader set of indicators of recommenda- 
tion quality (Ball 2010; McNee, Riedl, and Konstan 2006; 
Gunawardana and Shani 2015), and the indicators must be 
properly selected in relation to the application goal of the 
RS. In Vargas and Castells (2011; 2014), the authors have 
proposed specific metrics that complements precision in 
order to measure the “novelty” of the recommendations. 
Furthermore, in Kumar et al. (2017), an evaluation metric 
is proposed that assesses how similar the properties of the 
suggested items are to those in the test set. This enables to 
understand if the RS can suggest items different from those 
previously consumed by the user but still similar to them. 

Hence, as the literature suggests, also in the evaluation 
of tourism RSs, it is fundamental to consider their practical 
usage and how tourists consume products, that is, POIs in 
our scenario. In fact, while searching for a POI to visit or a 
hotel to book, tourists rarely seek suggestions for items that 
they can autonomously find. Conversely, they are looking 
for relevant and rewarding discoveries. Specifically, they 
expect to find items that they do not know yet, hence they 
are novel, but also aligned with their preferences/needs, 
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and capable to generate memorable experiences and sat- 
isfaction. Therefore, precision cannot be the sole metric 
used to assess the quality of a tourist RS. The novelty of 
the recommendations and the estimated reward, the user 
can obtain by consuming them, are important qualities of 
the recommendations that have to be assessed. 

Evaluating the novelty of recommendations in offline 
studies is hard and is only accomplished by measuring 
other properties of the recommendations that are asso- 
ciated with novelty, for example, the unpopularity: an 
item that is not popular in the observed choices of users, 
should also be novel when recommended (Gunawardana 
and Shani 2015). Moreover, a major drawback of offline 
studies lies in the fact that one must make the restric- 
tive assumption that only the evaluations (or the choices) 
present in the test set can be used to judge the qual- 
ity of the recommendations. Hence, the user’s previously 
observed behavior is considered as ground truth and novel 
behaviors, which could be proposed by the recommen- 
dations, cannot be judged. Clearly, preferences for novel 
items that an RS could suggest to the user, are not present 
in that set, that is, items not yet “evaluated” by the user 
are all considered as bad recommendations and decrease 
the estimated system’s precision. Moreover, the interaction 
context cannot be considered in an offline study (Ado- 
mavicius et al. 2011; Braunhofer and Ricci 2017). This 
means that, in offline evaluations, implicitly it is assumed 
that when the user evaluates an item, already evaluated 
in another context, the same evaluation would be given; 
which is rare. However, offline analysis of RSs perfor- 
mance allows comparing different RS variants at once, on 
a broad set of metrics, and by utilizing various datasets. 
These properties makes offline evaluations powerful and 
indispensable tools. 

User and online studies do not have that flexibility: 
only a few alternative RSs can be compared, by letting 
the users to try them, either in a controlled situation or 
in the wild (Bellogin and Said 2018; Gunawardana and 
Shani 2015; Knijnenburg et al. 2012; Pu, Chen, and Hu 
2011). Conversely, in user and online studies, the col- 
lected user/system interactions can be analzsed and the 
users’ reactions even to “novel” recommendations can be 
observed. The situation hence is very different from a sim- 
ulated offline evaluation where there is a single and static 
reference set of assumed good recommendations, which 
are the items in the user’s test set (Gunawardana and Shani 
2015). In user and online studies, the tester is able to ana- 
lyze the recommended items, and to decide which one is 
relevant or not, by using her specific idea of what a relevant 
recommendation is. In other words, online there are no 
stored “preferences” or choices that the RS must “predict,” 
as Offline, but preferences and choices are “constructed” 
while the user is interacting with the RS (Bettman, Luce, 


and Payne 1998). These studies are more expensive in terms 
of invested time and planning. Users have to be found, 
instructed and posed in an ecologically valid setting: the 
usage situation, the task to be performed, and the interac- 
tion should closely match the real setting in which the user 
actually interacts with the RS. Hence, such studies must be 
planned with care, and because of their high cost, they are 
often avoided in academic research. 

In the next section, we will exemplify and further spec- 
ify these general problems in the comparative analysis of 
the proposed next POI recommendation approach that we 
have illustrated in the previous section. 


OFFLINE AND ONLINE EVALUATION OF 
NEXT POI RECOMMENDATIONS 


In our experiments, we have used a dataset of POI-visit 
trajectories derived from tourists’ activities on a social net- 
work. Specifically, individual POI-visit trajectories in the 
city of Florence (Italy) are reconstructed from geo-tagged 
pictures uploaded on the Flickr* photo sharing platform 
(Muntean et al. 2015) and have been augmented with the 
context of the visit, for example, weather summary or part 
of the day, and POI features, for example, POI-categories 
and reputation (Massimo and Ricci 2018b; 2021b). We 
mentioned in Section 2 that social network users do not 
represent the full spectrum of tourists, hence the results of 
the presented experiments should be considered as more 
indicative of the effect of RS on this particular segment of 
users. The total number of POI-visit trajectories that we 
have considered is 1663. A trajectory contains on average 
11.7 POI-visits, and the number of unique POIs is 532. We 
note that the trajectories/users ratio is 1.43. In practice, the 
majority of the users in this dataset have just one visit tra- 
jectory. This makes clearly almost impossible to learn a 
user-specific user behavior, that is, a distinct reward func- 
tion and optimal policy for each user, and it justifies the 
proposed clustering-based approach. 


Offline experiment 


In Table 1, we show the results of an offline experiment. 
Here we compare the performance of the two IRL-based 
recommendation strategies Q-BASE and Q-POP PUSH (see 
Section 1) with a popular next item RS baseline, SKNN. 
SKNN is a nearest neighbor next-item RS, not specifi- 
cally tailored to the considered tourism application. SKNN, 
given the POI-visit trajectory of a target user, seeks other 
users who performed similar visits and recommends the 
most frequent next POI-visit performed by these similar 
users (Ludewig and Jannach 2018). Hence, while SKNN 
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TABLE 1 Offline analysis of next-POI recommendation performance (Top-1) 
Model Model description Reward Precision Novelty 
Q-BASE Maximal reward 0.073 0.043 0.061 
Q-POP PUSH Balance reward and popularity —0.002 0.099 0.000 
SKNN Popular among similar visitors —0.007 0.109 0.000 


aims to predict the next POI the tourist will visit, Q-BASE, 
as discussed in Section 1, tries to identify the POIs that 
have the characteristics usually liked by similar users 
(in the same cluster) and give to the tourist the largest 
(cumulative) reward. 

The RS performance metrics shown in Table 1 are meant 
to address the requirements of a tourist next-POI RS and 
are: Reward, Precision, and Novelty. 

Reward is the average increase of the system estimated 
reward a tourist obtains if she acts as recommended rather 
than as she did (test set). We note that the reward function 
is estimated on the base of the observed tourist behavior, 
as for the novelty of the users’ visited POIs. Importantly, 
a tourist can obtain even a larger reward by deviating 
from the observed behavior: hence a less precise recom- 
mendation can give a larger reward. This reflects the fact 
that the observed behavior is not necessarily optimal, and 
the proposed IRL-based recommendation strategies can 
detect that, and suggest even better options than those 
observed in the data. Hence, the reward metric measures a 
recommendation quality quite different from precision. 

Precision is the proportion of the recommendations 
found in the test set. Finally, Novelty is the percentage of 
the recommendations that covers the less popular items 
in the data (see Massimo and Ricci (2018a; 2020)). We 
have to resort to a proxy for measuring novelty since it is 
impossible in an offline study to appropriately measure 
the true novelty of a recommendation (Gunawardana and 
Shani 2015). True novelty of the recommendations will be 
instead measured in the user study discussed after. 

It is clear, by observing the results in Table 1, that Q-BASE 
recommends next POI-visits that have higher reward and 
are also more novel, at the cost of a lower precision. Inter- 
estingly, we note that Q-POP PUSH, by trying to optimie 
both the reward and the popularity of the recommended 
next-POIs loses the capability of Q-BASE to suggest high 
reward POIs, and it performs substantially equal to SKNN. 
It is worth noting, not shown here for lack of space, that 
with a better tuning of the weighted combination of the 
reward and popularity criteria, Q-POP PUSH can achieve 
the precision performance of SKNN while offering much 
of the reward obtained by Q-BASE. These results point out 
the difficult choice for the designer of a tourist RS; the RS 
should be precise, but the implication is that it will then 
often suggest popular items that are likely to be already 
known by the tourist. Hence, in this way, the actual utility 


of the RS will be limited. Q-BASE tries to recommend novel 
(not popular) items that are estimated to be “rewarding” 
for the user based on the fact that tourists in the same 
cluster visit similar items to those recommended. 

Clearly, the fact that the recommended next POIs are 
actually relevant and useful can only be assessed by a user 
or online study, which, however, presents other challenges: 
is the user capable to assess the satisfaction (reward) 
that the true visit experience to the recommended POIs 
will generate? 


Online user study 


In order to better understand the users’ perceived novelty 
and expected satisfaction of the next-POI visit recommen- 
dations generated by Q-BASE, Q-POP PUSH, and SKNN, 
we have implemented a web-based application accessible 
from desktop and mobile browsers to simulate a visit to 
Florence (Italy) (Massimo and Ricci 2020). We recruited, 
via social media and mailing lists, 158 subjects who have 
actually visited Florence before the study. We wanted to 
address tourists that are somewhat familiar with the des- 
tination (and its POIs), so that they can better estimate 
the quality of next POI recommendations in this city: they 
should have visited already some POIs to evaluate the RS’s 
suggestions about what to do next. We have designed a 
user/system interaction that enables the subjects to reflect 
and make choices as similarly as possible as for a real 
next-POI visit decision. The experimental system tries to 
generate the specific context of a true visit. During the 
interaction with the system, the subject is helped to imag- 
ine the real context and make decisions that will be likely 
to be taken when facing that decision task. 

The application first profiles the subjects by asking them 
to list some of the previously visited POIs in that destina- 
tion. This process is facilitated by the presence of pictures 
and descriptions of the POIs (Figure 1). 

Then, a small number of POIs (5 items), among those 
declared to have already been visited, are used to build a 
personalized itinerary that each subject is supposed to have 
completed at the time point when she requests a next POI 
recommendation. Besides, in order to allow Q-BASE and 
Q-POP PUSH to generate recommendations, subjects are 
assigned to one of the pre-existing clusters, which are com- 
puted on the previously acquired tourists’ visit trajectories. 
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Tell us what you have already visited in Florence 


Please, select the POls that you have already visited in Florence (as many as you can) 
a You can use the search bar in order to access hundreds of POIs in the heritage city center of Florence! 


= You can either click on items in the list below or search for them. 


Old Bridge 4 Giotto's Bell Tower Piazza della Signoria 


Loggia del Bigallo Hercules and Cacus Patroclus and Menelaus 


Florence POls 
Start typing here to Search 


Most popular POIs 


Old Bridge Giotto's Bell Tower 


Porta della Mandorla Fountain of Neptune 


FIGURE 1 User profiling by POI selection UI 


The cluster selected for a target subject is the one that best 
matches the POIs that the subject has declared to have 
already visited. Then, finally, at recommendation time the 
subjects are asked to evaluate a list of next-POI recom- 
mendations generated by mixing the recommendations 
computed by the three evaluated RSs: three recommenda- 
tions for each RS. The subjects were not informed which 
RS recommends what. Hence, a small number of recom- 
mendations are generated, ranging from three, if all the 
three RSs suggest the same POIs, to nine, if they are all 
different. By using a designed GUI control, the subjects 
are then requested to judge if the recommended POIs have 
been previously “visited,” are “liked” or are “novel.” We 
aim at eliciting behavioral responses as close as possible 
as in a real condition. The user interface designed for the 
evaluation of the recommendations is shown in Figure 2. 

An important aspect to consider, when discussing the 
results of an online study like this, is surely related to the 
question whether a subject/tourist could express a reli- 
able “like” judgment on a POI that she does not know, 
that is, a “novel” POI, by simply relying on the system’s 
presentation of the POI. In fact, while the other types of 
feedback (“visited” and “novel”) are very likely to be cor- 
rectly formulated, unless the tourist has forgotten some of 
the previous visit experiences, the “like” judgment is only 
a subjective signal that the tourist expects to have a reward- 
ing (future) experience when visiting the recommended 
POI. Clearly, a liked POI may or may not result in a sat- 
isfying visit (rewarding), and, even more importantly, not 
liked POIs can still produce satisfying visits, when they are 
actually visited. 

The obtained results are shown in Table 2. We measured 
the probability that a subject marks as “visited,” “novel,” 
“liked,” or both “liked” and “novel” a POI recommended 
by a specific RS. Probabilities are estimated by dividing the 
total number of items marked as visited (liked, novel, and 


Fountain of Neptune 


Piazza della Signoria 


Baptistery of San Giovanni 


Porta della Mandorla { 


Loggia del Bigallo 
The Loggia del Bigallo, with its 
adjoining palace, is located in Piazza 


Cathedral of Santa Maria del ... 


Basilica of Santa Croce 


both liked and novel), for each RS, by the total number of 
recommendations offered by the RS. 

It is clear that Q-BASE recommends POIs that are less 
likely to have been already visited by the subject, and 
more likely to be novel, compared to those suggested 
by Q-POP PUSH and SKNN. Interestingly, Q-POP PUSH and 
SKNN perform similarly, which seems to be connected to 
the popularity bias of both methods. It is evident that these 
results are matching the offline study results. This is not 
always true, as in many cases, offline results diverge from 
online ones, because different properties of the RS are 
measured in the two testing scenarios (Chen et al. 2017; 
Gunawardana and Shani 2015). But, we must also note that 
Q-BASE offers fewer POIs that are liked, compared to the 
other two recommendation strategies. Hence, apparently, 
Q-BASE, by trying to optimie the reward, is not equally 
able to produce recommendations that the subject likes. 
The rationale is that most of Q-BASE recommendations are 
actually novel, that is, the subject does not have an opin- 
ion about these items when they are presented. Therefore, 
the subject must understand whether she likes them or 
not, solely on the base of the provided information and 
explanation. This is complex and makes it difficult for the 
subject to formulate an assessment of the expected satis- 
faction for the future visit experience, which is supposed 
to determine the “like” evaluation. Despite this fact, it is 
interesting to note that Q-BASE generates more recommen- 
dations that are both liked and novel (“Liked and Novel” 
feedback), so, when a recommended POI is equally novel 
for all the three RSs, if it is suggested by Q-BASE, then it 
is more often liked. This matches well the main goal of a 
tourist RS: letting tourists to discover novel POIs that when 
visited will produce a satisfying experience. Still, we stress 
that the evaluation is based only on the subject’s estima- 
tion of the true value of the recommendation, since POIs 
are here evaluated before they are experienced. 
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(1) = y Giotto's Bell Tower @ dat, Porta della Mandorla @® =a Old Bridge 4) Loggia del Bigallo © Piazza della Signoria 
Sr rk 
Suppose that you visited the selected attractions in the order of the itinerary shown above. 
The grey box below contains suggestion about what you can visit after Piazza della Signoria. 
Please, evaluate each suggestion by clicking on the appropriate icons: 
© You have already been to the suggested place 
1@ You find the suggested place interesting and you like it 
You didn't know about the suggested place 
For each suggestion you can click on one or more icons. 
Suggestions for you 
Torre dei Pulci 
Work of St. John 
Door of Paradise 
Cathedral of Santa Maria de... Brunelleschi's dome 
Brunelleschi's dome is the cover of the cruise of the Duomo of Florence; at 
Monument to Giovanni delle ... the time of construction it was the largest dome in the world and still 
remains the largest masonry dome ever built (the maximum diameter of 
F 3 the inner dome is 45.5 meters, while that of the exterior is 54.8). Thanks to 
Vasari Corridor : : 
the fundamental importance it has played for the subsequent development 
; of architecture and the modern conception of building, it is still the most 
Brunelleschi's dome important architectural work ever built in Europe since Roman times 
FIGURE 2 Evaluation GUI. From top to bottom: itinerary detail; info box; recommendations and item details 
TABLE 2 Probability to evaluate a POI recommendation as visited, novel, liked, and both novel and liked 
Recommender system Visited Novel Liked Liked and novel 
Q-BASE 0.165* 0.517* 0.361* 0.091 
Q-POP PUSH 0.245 0.376 0.464 0.076 
SKNN 0.238 0.371 0.466 0.082 


*indicates significant difference from the other two RSs perf. (two proportion z-test, p < 0.05). 


By summarizing the results of the study, we derive the 
following conclusions. The POI-visit suggestions gener- 
ated by SKNN and Q-POP PUSH are liked more than those 
produced by Q-BASE, because both RSs tend to recom- 
mend items that are less novel than those recommended by 
Q-BASE. Moreover, Q-BASE, in the attempt to optimize the 
reward function and suggesting items that have the prop- 
erties typically liked by the user, does not care for the item 
popularity and often recommends novel POIs, which are 
hard to be appreciated. In fact, when the popularity bias is 
added to Q-BASE, that is, by using the hybrid model Q-POP 
PUSH, this IRL-based RS can produce results similar to that 
of SKNN. 

Hence, this study illustrates a common “dilemma” in 
tourist RSs: tourists tend to like more the items they 
are familiar with, even POI that have been previously 


visited, but, useful recommendations are for items that 
are novel, which tend to be liked less. In fact, by ana- 
lyzing the experimental data, we discovered that for all 
the three RSs, the probability that a user likes a recom- 
mended POI that she has already visited tends to be much 
larger than the probability to like a novel one. This is 
confirmed by the outcome of a post-survey in which par- 
ticipants declared that it is difficult to like something that 
is novel and unknown. This points out two main issues 
to be considered in the online evaluation of an RS. At 
first, it is unclear how users can judge items that they 
have not yet experienced. Then, it is unclear how an 
evaluation based on the user-perceived (expected) utility 
for an item can measure the actual utility that the user 
will gain in the real experience with the recommended 
item. 
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OPEN CHALLENGES FOR TOURISM 
RECOMMENDER SYSTEMS 


We argue that in order to build effective tourism RSs, 
it urges to focus on the true needs of the users. We 
must develop models that are able to conceptualize what 
makes a POI worth to be visited, which implies that they 
must properly structure the available knowledge. This 
can enable the RS to learn what and how tourists con- 
sume POIs. The field has not yet achieved the level of 
development of other types of RSs because the research 
has not yet addressed its specific requirements and con- 
straints (Werthner et al. 2015). Tourists that seek recom- 
mendations should be able to discover new POIs to visit: 
these are the POIs that they cannot easily find by them- 
selves, for example, by using existing travel portals/guides, 
which generically suggest popular and highly rated items. 
We argue that tourism RSs should avoid to recommend 
blockbuster POIs, or at least accompany such POI recom- 
mendations with others that are novel, are perceived as 
worth trying and will actually produce rewarding experi- 
ences. To identify these items, we need to further study 
methods that are able to correctly estimate the quality of 
the experience that the tourist can gain by visiting a POI. 
As it emerged from our research, tourists struggle to judge 
POIs that are new to them even when they have a high esti- 
mated reward, that is, they fit the preferences learned by 
mining their observed behavior. This clearly suggests that 
there is a need to identify solutions to give users the ability 
to better assess the value of those items. We believe that it 
is important to focus even more on explanation methods 
for recommendations (Zhang and Chen 2020), especially 
approaches that can leverage the structural properties of 
IRL models (Ermon et al. 2015). For instance, we believe 
that by utilizing a proper knowledge to represent the 
observed POI-visit trajectories and then by learning the 
reward function for each POI visit and the associated POI- 
visit selection policy, we can then employ this information 
to devise explanation styles (Kouki et al. 2019) that can 
point out how and why the tourist should make the rec- 
ommended visit choices (Jameson et al. 2014). In this way, 
it could be possible to build a more “persuasive” (con- 
versational) system that nudges the user to accept and 
understand the proposed recommendations, and better 
help the user to evaluate the expected satisfaction of a visit 
to a possibly unknown POI. 

A second aspect that the research on tourism RSs has 
to better discuss and consolidate is a proper evaluation 
approach. First of all, it is important to employ datasets 
of users’ behavioral data (e.g., ratings or choices) and 
item descriptions (domain knowledge) that are represen- 
tative of the real behaviors and interests of the tourists. 


Many existing data sets, including the ones that we have 
used, offer a partial description of user behavior, and 
they focus on special users in restricted group sets (e.g., 
location-based social networks). Moreover, what are the 
distinguished POIs to be considered and recommended is 
not obvious: new tourism services are continuously gen- 
erated (Werthner and Klein 1999) and what is understood 
as a target POI by certain tourists is not even recognized 
as a POI by others. For instance, in our post user study sur- 
vey, it emerged that the database of POIs that we employed 
was presenting items that were not easily identified as 
clear touristic landmarks by most of the subjects. For 
instance, many relatively small POIs (e.g., the door of a 
church) should be better collapsed into a unique broader 
POI (the church itself). This highlights the importance 
of a better definition of what is an item to be recom- 
mended, that is, an item that the tourist can judge as a 
worthy choice. 

Furthermore, we would like to note that a promising 
way to overcome the obstacles in designing and run- 
ning live user-studies is offered by counterfactual learning 
methods (Agarwal et al. 2017; Gilotte et al. 2018; Swami- 
nathan and Joachims 2015). These techniques allow to 
assess offline a new recommendation strategy as ifit would 
have been deployed and tested online, by means of a user 
study. This is implemented by using an existing data set, 
as in normal offline studies, but after having debiased the 
observations actually present in the data set. This avoids 
to overestimate in the observed users’ behavior, choices 
that are not proper signals of users’ preferences but are 
rather influenced by the recommendations the subject was 
exposed to (while the logged data were collected). Hence, 
one can re-weight the relevancy of certain observed POI- 
visit actions and eventually mitigate their importance, so 
that a more precise estimate of the true reward brought by 
different POIs can be computed. This “debiased” reward 
can then be used to assess offline the performance of a 
novel RS strategy without the burden of deploying it in 
an online system. This is termed counterfactual learning 
and it brings a specific benefit: it allows to bypass the 
difficulties related to set up a proper user study by allow- 
ing researchers to quantify the same objective in offline 
experiments as if they would have involved real users. 

Besides, we believe that it is still an open research 
question how to correlate offline metrics, not only pre- 
cision, to the perceived qualities and experience of the 
recommended item. By better understanding the fac- 
tors that make a recommendation satisfactory for a 
user, we could operationalize offline metrics that quan- 
tify those factors. Hence, this can help to link real 
perceptions to quantifiable offline properties of the 
recommendations. 
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